Use of prosodic features for speech recognition

نویسندگان

  • Keikichi Hirose
  • Nobuaki Minematsu
چکیده

Prosody is known to play an important role in human speech perception process. Therefore, there is an increasing need to use prosodic features for the advancement of speech recognition technology. However, prosody is related to various levels of information, from linguistic, para-linguistic, to non-linguistic, and, therefore, its acoustic manifestation is rather complicated with large variations. This fact prevents prosody to be incorporated in speech recognition process. In the current paper, discussions are given on how we can utilize prosodic features, showing our research works as examples. First, an idea of including word likelihood viewed from the accent type into the recognition process is shown. Second, a scheme of using prosody to control the pruning size in the decoding process is given. Prosodic features should be modeled rather differently form segmental features. Lastly, a new language model constructed by including prosodic events is explained.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract   Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Study on Detection of Prosodic Phrase Boundaries in Spontaneous Speech

Prosodic information, which has the abilities of disambiguation, improving the parsing of the spoken language and predicting recognition errors, becomes more and more important in speech recognition and understanding, especially in spontaneous speech. In this paper, we investigate the detection of the phrase boundaries by prosodic features in the domain-specified Chinese spontaneous speech. The...

متن کامل

Noise Robust Speech Recognition Using Prosodic Information

This paper proposes a noise robust speech recognition method for Japanese utterances using prosodic information. In Japanese, the fundamental frequency (F0) contour conveys phrase intonation and word accent information. Consequently, it also conveys information about prosodic phrase and word boundaries. This paper first proposes a noise robust F0 extraction method using the Hough transform, whi...

متن کامل

A generalised model for utilising prosodic information in continuous speech recognition

Prosodic features in continuous speech provide cues which may be used to disambiguate syntactic ambiguities and to increase the accuracy of speech recognition/understanding systems. This paper presents a novel method using a multivariate statistical framework for producing a model of the relationship between prosodic and syntactic structures in continuous speech. The model can be used for Lingu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004